{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ollama Embeddings With Langchain\n",
    "\n",
    "- Author: [Gwangwon Jung](https://github.com/pupba)\n",
    "- Peer Review : [Teddy Lee](https://github.com/teddylee777), [ro__o_jun](https://github.com/ro-jun), [BokyungisaGod](https://github.com/BokyungisaGod), [Youngjun cho](https://github.com/choincnp)\n",
    "- Proofread : [Youngjun cho](https://github.com/choincnp)\n",
    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/08-Embedding/05-OllamaEmbeddings.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/08-Embedding/05-OllamaEmbeddings.ipynb)\n",
    "## Overview\n",
    "\n",
    "This tutorial covers how to perform ```Text Embedding``` using ```Ollama``` and ```Langchain```.\n",
    "\n",
    "```Ollama``` is an open-source project that allows you to easily serve models locally.\n",
    "\n",
    "In this tutorial, we will create a simple example to measure the similarity between ```Documents``` and an input ```Query``` using ```Ollama``` and ```Langchain```.\n",
    "\n",
    "![example](./assets/example-flow-ollama-embedding-cal-similarity.png)\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environement Setup](#environment-setup)\n",
    "- [Ollama Install and Model Serving](#ollama-install-and-model-serving)\n",
    "- [Identify Supported Embedding Models and Serving Model](#identify-supported-embedding-models-and-serving-model)\n",
    "- [Model Load and Embedding](#model-load-and-embedding)\n",
    "- [The similarity calculation results](#the-similarity-calculation-results)\n",
    "\n",
    "### References\n",
    "\n",
    "- [Ollama](https://ollama.com/)\n",
    "- [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    "**[Note]**\n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "%pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\n",
    "        \"langchain_community\",\n",
    "        \"langchain-ollama\",\n",
    "        \"scikit-learn\",\n",
    "    ],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ollama Install and Model Serving\n",
    "\n",
    "Ollama is an open-source project that makes it easy to run large language models(LLM) in a local environment. This tool allows users to download and run various LLMs with simple commands, enabling developers to experiment with and use AI models directly on their computers. Ollama is a tool with a user-friendly interface and fast performance, making AI development and experimentation more accessible and efficient.\n",
    "\n",
    "- [Official Website/Installation](https://ollama.com/)\n",
    "\n",
    "### Ollama User Guide\n",
    "**1. Install Ollama** <br>\n",
    "    ![guide1](./assets/guide1.png)\n",
    "\n",
    "**2. Verify Ollama Installation** <br>\n",
    "    ![guide2](./assets/guide2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Identify Supported Embedding Models and Serving Model\n",
    "\n",
    "👇 You can find the model in the hyperlink below.\n",
    "\n",
    "- [Ollama Models](https://ollama.com/search)\n",
    "\n",
    "### Ollama Model Pull Guide\n",
    "\n",
    "**1. Search Models** <br>\n",
    "\n",
    "![guide3](./assets/guide3.png)\n",
    "\n",
    "**2. Pull a Model** <br>\n",
    "\n",
    "![guide4](./assets/guide4.png)\n",
    "\n",
    "**3. Verify the Model** <br>\n",
    "\n",
    "![guide5](./assets/guide5.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Load and Embedding\n",
    "\n",
    "Now that you have downloaded the model, let's load the model you downloaded and proceed with the embedding.\n",
    "\n",
    "First, define ```Query``` and ```Documents```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query\n",
    "q = \"Please tell me more about LangChain.\"\n",
    "\n",
    "# Documents for Text Embedding\n",
    "docs = [\n",
    "    \"Hi, nice to meet you.\",\n",
    "    \"LangChain simplifies the process of building applications with large language models.\",\n",
    "    \"The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.\",\n",
    "    \"LangChain simplifies the process of building applications with large-scale language models.\",\n",
    "    \"Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.\",\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's load the embedding model downloaded with ```Ollama``` using ```Langchain```.\n",
    "\n",
    "\n",
    "The ```OllamaEmbeddings``` class in ```langchain_community/embeddings.py``` will be removed in ```langchain-community``` version 1.0.0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load Embedding Model : Legacy\n",
    "from langchain_community.embeddings import OllamaEmbeddings\n",
    "\n",
    "# Serving and load the desired embedding model.\n",
    "ollama_embeddings = OllamaEmbeddings(\n",
    "    model=\"nomic-embed-text\",  # model=<model-name>\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, in this tutorial, we used the ```OllamaEmbeddings``` class from ```langchain-ollama```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load Embedding Model : Latest\n",
    "from langchain_ollama import OllamaEmbeddings\n",
    "\n",
    "# Serving and load the desired embedding model.\n",
    "ollama_embeddings = OllamaEmbeddings(\n",
    "    model=\"nomic-embed-text\",  # model=<model-name>\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use the loaded model to embed the ```Query``` and ```Documents```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Embedding Dimension Output: 768\n"
     ]
    }
   ],
   "source": [
    "# Embedding Query\n",
    "embedded_query = ollama_embeddings.embed_query(q)\n",
    "\n",
    "# Embedding Documents\n",
    "embedded_docs = ollama_embeddings.embed_documents(docs)\n",
    "\n",
    "print(f\"Embedding Dimension Output: {len(embedded_query)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The similarity calculation results\n",
    "\n",
    "Let's use the vector values of the query and documents obtained earlier to calculate the similarity.\n",
    "\n",
    "In this tutorial, we will use [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to calculate the similarity between the ```Query``` and the ```Documents```.\n",
    "\n",
    "Using the ```Sklearn``` library in Python, you can easily calculate **cosine similarity**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Query] Tell me about LangChain.\n",
      "====================================\n",
      "[0] similarity: 0.775 | The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.\n",
      "\n",
      "[1] similarity: 0.748 | LangChain simplifies the process of building applications with large language models.\n",
      "\n",
      "[2] similarity: 0.745 | LangChain simplifies the process of building applications with large-scale language models.\n",
      "\n",
      "[3] similarity: 0.399 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.\n",
      "\n",
      "[4] similarity: 0.398 | Hi, nice to meet you.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from sklearn.metrics.pairwise import cosine_similarity\n",
    "\n",
    "# Calculate Cosine Similarity\n",
    "similarity = cosine_similarity([embedded_query], embedded_docs)\n",
    "\n",
    "# Sorting by Similarity in descending order\n",
    "sorted_idx = similarity.argsort()[0][::-1]\n",
    "\n",
    "# Output Result\n",
    "print(\"[Query] Tell me about LangChain.\\n====================================\")\n",
    "for i, idx in enumerate(sorted_idx):\n",
    "    print(f\"[{i}] similarity: {similarity[0][idx]:.3f} | {docs[idx]}\")\n",
    "    print()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "test",
   "language": "python",
   "name": "langchain"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
